Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable logsdb for http_logs #646

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

inqueue
Copy link
Member

@inqueue inqueue commented Aug 29, 2024

Add support for logs data streams (logsdb) and data streams to the http logs track. Logsdb was added in ES 8.15.

* add new parameters for logsdb and data streams
* remove index.json
* add a composable template supporting standard indices, data streams, and logs data streams
* Update README
@inqueue inqueue self-assigned this Aug 29, 2024
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my side.

}
"operation-type": {{ indexing_operation_type | default("create-index") | tojson }},
"settings": {%- if index_settings is defined %} {{index_settings | tojson}} {%- else %} {
"index.sort.field": "@timestamp",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the template, not sure what else can be a good sorting field without knowing the data, I think just sorting by timestamp is good.

Copy link
Contributor

@salvatore-campagna salvatore-campagna Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is basically nothing there @timestamp, message, clientip, request, status, size and geoip. At the end it depends on queries...the idea is that we want to sort to favor query latency. But sorting will also determine effective doc value compression.

Copy link
Contributor

@salvatore-campagna salvatore-campagna Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a result I think we can use this as an example to udnerstand what happens if host.name is missing. I would just remove the index sorting configuration and rely on defaults. In practice since host.name is missing it would result in just sorting on @timestamp (the host.name field is injected and will be empty). LogsDB handles gracefully the fact that the host.name field might be missing.

@@ -42,6 +42,8 @@ This track allows to overwrite the following parameters with Rally 0.8.0+ using
* `number_of_shards` (default: 5)
* `source_enabled` (default: true): A boolean defining whether the `_source` field is stored in the index.
* `index_settings`: A list of index settings. Index settings defined elsewhere (e.g. `number_of_replicas`) need to be overridden explicitly.
* `index_mode` (default: unset): Set to `logsdb` to enable indexing to [logs data streams](https://www.elastic.co/guide/en/elasticsearch/reference/master/logs-data-stream.html). If not enabled, Rally will not use logs data streams.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could just enable logsdb on the plain index and not use data streams. This would make this change a little bit simpler, since this is an index oriented track?

But I think it is fine to use logsdb with data stream here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL logsdb can be enabled on a plain index. I took from https://www.elastic.co/guide/en/elasticsearch/reference/current/logs-data-stream.html it was required to be a data stream.

I would like to leave data stream support for serverless, though I think combining the template into one may have complicated things a bit, so I am thinking of splitting the template into two files: one for a plain index, the other for data stream. WDYT?

"dynamic": "strict",
"_source": {
"enabled": {{ source_enabled | default(true) | tojson }}
"priority": 101,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used 101 to ensure the template took priority over the built-in one for logs-*. Should it be 100 for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the fact that the standard ones should have 100...so it should be fine for this to have 101 (and have higher priority). I saw some CI failures saying that there is a template priority issue while merging templates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

Cannot run task [create-all-templates]: Request returned an error. Error type: api, Description: illegal_argument_exception ({'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'index template [rally-http_logs] has index patterns [logs-*, reindexed-logs] matching patterns from existing templates [apm-index-template] with patterns (apm-index-template => [traces-apm*, logs-apm*, metrics-apm*]) that have the same priority [101], multiple index templates may not match during index creation, please use a different priority'}], 'type': 'illegal_argument_exception', 'reason': 'index template [rally-http_logs] has index patterns [logs-*, reindexed-logs] matching patterns from existing templates [apm-index-template] with patterns (apm-index-template => [traces-apm*, logs-apm*, metrics-apm*]) that have the same priority [101], multiple index templates may not match during index creation, please use a different priority'}, 'status': 400}), HTTP Status: 400

@martijnvg
Copy link
Member

@inqueue Do you plan to merge this? I'm planning to run http_logs with logsdb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants